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Abstract 

These  notes  introduce  AVL  trees  and  related  algorithms.   They 
were  written  as  supplementary  material  for  a  course  in  data  structures 
given  by  the  Department  of  Computer  Science  of  the  University  of  Illinois 
at  Urbana-Champaign  during  the  second  semester  of  the  1970-71  academic 
year.   A  background  of  §2.2  and  §2.3  in  The  Art  of  Programming  by  Knuth 
(Addison-Wesley,  1968)  is  assumed. 
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3.74,  3-79  (AVL  trees) 


One  popular  method  for  storing  and  retrieving  information  by  its 
"name"  is  to  store  the  names  in  a  binary  tree.   Each  node  of  this  tree  would 
look  as  in  Figure  1, 


LLINK 

NAME 

RLINK 

Figure  1. 
where  LLINK  is  a  pointer  to  the  left  subtree  of  the  node,  NAME  is  the  name 
of  the  information  (along  with  a  pointer  to  the  information),  and  RLINK 
is  a  pointer  to  the  right  subtree  of  the  node.   Generally,  the  names  are 
stored  in  the  tree  in  such  a  fashion  that  the  postorder  list  of  the  nodes  is 
an  alphabetized  list  of  the  names.   For  example,  suppose  the  information  to 
be  stored  is  the  documentation  for  various  programming  languages;  the  tree 
might  be  constructed  as  in  Figure  2. 


LISP 


COBOL 


bcfl 


CUPL 


GEDANKEN 


RUSH 


NUCLEOL 


SNOBOL 


FORTRAN 


Figure  2. 


To  find  a  given  name  in  the  tree  we  perform  the  following  algorithm 

which  is  closely  related  to  postorder  traversal  of  the  nodes : 

Algorithm  S 

Let  T  be  a  pointer  to  the  tree  to  be  searched  and  let 
ITEM  be  the  name  for  which  we  are  searching. 

Step  1  (initialize) 

Set  P  ^  T 

Step  2  (is  it  the  root?) 

If  P  = A  then  the  name  we  are  looking  for  is  not  in 
the  tree  and  we  are  done.   If  NAME(P)  =  ITEM  then  we  have  formed 
the  name  and  we  are  done. 

Step  i  (Go  down  one  level) 

If  NAME(P)  >  ITEM  set  P  <-  LLINK(P). 
If  HAME(P)  <  ITEM  set  P  *-  BLINK  (P). 
Go  back  to  step  2. 

One  special  case  of  this  method  is  the  binary  search  and  its 

variants. 
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It  is  clear  that  if  we  apply  Algorithm  S  to  the  tree  in  Figure  2 
then  at  most  five  comparisons  will  be  needed  to  determine  whether  or  not  a 
given  name  is  in  the  tree.   On  the  other  hand,  had  the  tree  structure  been 
as  shown  in  Figure  3,  as  many  as  eight  comparisons  may  have  been  required. 
Clearly  the  structure  of  the  tree  in  Figure  2  is  more  desirable. 

In  general,  it  is  desirable  to  have  the  tree  structure  as  "balanced" 
as  possible.   Thus  from  the  criterion  of  searching  the  tree,  the  optimal 
structure  for  n  names,  assuming  each  name  is  equally  probable,  would  be  the 
complete  binary  tree  of  n  nodes;  then  Algorithm  S  is  a  binary  search.   (See 
page  Ij.01  in  Volume  1  of  Knuth.  )  However,  the  names  to  be  stored  are  usually 
dynamic  in  nature  and  so  one  must  allow  for  frequent  additions  and  deletions 
to  the  trees;  if  we  require  the  tree  to  be  completely  balanced,  then  the 
addition  or  deletion  of  a  name  would  require  the  complete  restructuring  of 
the  tree.   For  example,  the  completely  balanced  tree  for  the  names  in  the 
trees  given  in  Figures  2  and  3  is  given  in  Figure  k- 
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Figure  k- 


Adding  a  new  language  name,  say  AMBIT; would  require  a  complete 
restructuring  of  the  tree  to  obtain  the  tree  illustrated  in  Figure  5- 
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Figure   % 

Clearly  there  is  a  tradeoff  between  the  maximum  time  required  to 
add  and  delete  items  and  the  maximum  time  required  to  retrieve  an  item.   To 
summarize,  if  there  are  n  names  then  the  minimal  retrieval  time  is 
approximately  log  n  comparisons  for  a  balanced  tree,  but  addition  and  deletion 
of  nodes  required  a  complete  reorganization  of  the  tree,  requiring 
proportional  to  n  steps.   On  the  other  hand,  if  the  tree  is  allowed  to  grow 
randomly,  so  that  addition  and  deletion  of  nodes  is  facilitated,  then 
retrieval  time  can  take  as  long  as  n/2  comparisons. 


AVI  Trees 

About  ten  years  ago  Adel'son-Vel'skii  and  Landis  proposed  a  tree 
structure  which  provides  a  good  compromise  between  the  two  extremes  of 
complete  balancing  and  unrestricted  growth.   Their  structures,  now  known  as 
AVL  trees  are  binary  trees  which  have  the  property  that  the  heights  of  the 
two  subtrees  of  any  node  in  the  tree  differ  by  at  most  one.   For  example, 
the  following  trees  are  not  AVL; 


On  the  other  hand,  the  following  trees  are  AVI; 


To  discover  an  upper  bound  on  the  number  of  comparisons  required 
to  retrieve  a  name  in  the  worst  case,  we  calculate  the  least  number  of 
elements  required  to  form  an  AVL  tree  of  k  levels.   Such  a  tree  is  sometimes 
called  a  mintree   and  will  be  symbolized  M  .   A  mintree  of  k  levels  may  be 
constructed  by  taking  one  item  as  the  root  of  the  tree  and  placing  a  mintree 
of  height  k-1  as  one  subtree  and  a  mintree  of  height  k-2  as  the  other 
subtree  (compare  this  with  Fibonacci  trees).   Counting  the  elements  of  this 
tree  we  get  a  recursive  formula  for  Nk,  the  number  of  items  in  M  : 


\   •   \-l  +  "k-2  +1- 


This  equation  can  be  solved  by  standard  techniques,  and  the  solution  is 


\    =  1  +  7T 


k-1 


-1. 


Thus  the  height,  b,  of  an  AVL  tree  with  R    nodes  is  bounded  above  by 


h  <  |  log2(Nk  +  1)  -  1, 
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and  so  the  maximum  number  of  comparisons  to  retrieve  a  name  is  ^  log  n, 
where  n  is  the  number  of  names  in  the  tree.    So  the  number  of  comparisons 
required  in  the  worst  case  for  retrieval  in  AVL  trees  compares  favorably 
with  the  number  required  for  completely  balanced  trees. 


Addition  of  Nodes  to  AVL  Trees 

In  order  for  AVL  trees  to  "be  useful,  we  must  show  that  it  is  not 
too  difficult  for  the  tree  to  change  dynamically.   This  section  gives  an 
algorithm  requiring  time  proportional  to  log2n  to  insert  a  new  node  in  an 
AVL  tree  of  n  nodes;  the  difficulty  in  this  problem  is  clearly  the  fact 
that  the  tree  must  be  an  AVL  tree  after  the  insertion  of  the  new  node. 

We  will  suppose  that  we  have  been  given  an  AVL  tree,  T,  and  a 
new  name,  N,  to  be  added  to  that  tree.  Applying  Algorithm  S  to  find  N  in 
T  the  algorithm  terminates  at  step  2  with  P  =j±  =   a  left  or  right  son 
of  some  node  in  T;  it  is  at  this  point  in  the  tree  that  N  should  be  added 
as  a  new  leaf,  provided  that  the  augmented  tree  would  still  be  an  AVL  tree. 

In  any  AVL  tree  exactly  one  of  three  possible  conditions  will 
be  true  at  any  node  of  the  tree: 

CI.   The  height  of  the  left  and  right  subtrees  of  the  node 
are  equal. 

C2.  The  height  of  the  right  subtree  of  the  node  is  one 
greater  than  the  height  of  the  left  subtree  of  the 
node. 

C3.   The  height  of  the  right  subtree  of  the  node  is  one  less 
than  the  height  of  the  left  subtree  of  the  node. 

The  algorithm  for  the  insertion  of  a  new  node  will  insert  that 
node  in  the  position  where  Algorithm  S  discovered  it  was  not  in  the  tree; 
that  is  it  will  be  added  as  the  son  of  a  node  already  in  the  tree.   We 
will  refer  to  the  tree  with  the  node  added  as  the  augmented  tree.   If  this 
new  node  was  added  as  the  left  (right)  son  of  a  node  which  previously 
had  only  a  right  (left)  son,  then  the  tree  is  clearly  still  an  AVL  tree 
ani  no  more  has  to  be  done.   On  the  other  hand,  if  this  new  node  was 
added  as  the  son  (right  or  left)  of  what  was  previously  a  leaf,  we  may 
have  destroyed  the  AVL  property  by  increasing  the  length  of  a  subtree.   The 


crucial  part  of  the  insertion  algorithm  is  to  discover  if  the  AVL  property 
of  the  tree  has  been  ruined  and  if  so  to  restructure  the  tree  to  regain 
the  AVL  property. 

The  augmented  tree  is  tested  for  the  AVL  property  by  going 
up  the  path  from  the  newly  added  node  to  the  root  of  the  tree.   This 
is  facilitated  in  the  algorithm  by  storing  that  path  in  a  push 
down  stack  as   the  tree  is  being  searched  to  find  where  the  node  is  to 
be  added;  The  construction  of  such  a  stack  is  a  simple,  inexpensive  task, 
'or  in  an  AVL  tree  of  a  billion  nodes  a  maximum  of  about  fifty  stack  entries 
will  be  needed  (see  Exercise  3). 

As  we  follow  the  path  up  from  the  newly  added  node,  we  must  check 
at  each  node  which  of  CI,  C2,  or  CJ  holds.   This  will  be  done  by  having  each 
node  contain,  in  addition  to  the  NAME,  LLINK,  and  RLINK,  a  field  called 
!OND  which  indicates  which  condition  holds.   As  the  path  is  traced  upward 

the  new  node,  in  addition  to  checking  the  conditions,  we  will  also  update 
them  to  take  the  new  node  into  account.   This  is  where  the  difficulties  arise: 
What  if  in  adding  the  new  node  we  have  lengthened  a  subtree  which  was 
already  one  longer  than  its  brother  subtree?  At  this  point  we  must 
restructure  the  tree. 

When  we  consider  the  existing  left-right  symmetries,  we  find  that 
;he  only  ways  the  tree  can  be  unbalanced  are  given  in  the  three  cases  below. 

Case  1: 


r 
Here  the  new  node,  r,  has  been  added  to  a  leaf  of  the  tree, 

causing  a  subtree  containing  that  node  to  be  too  long.   The  tree  is 

rebalanced  by  changing  it  to 


Case  2; 


Subtree  A 
of  height 

n+1 


Subtree  B 
of  height 

n 


■ 
Subtree  C 
of  height 
n 


Here  the  new  node  has  been  added  in  subtree  A,  causing  the  height 
of  the  left  subtree  of  p  to  be  n+2  while  the  height  of  the  right  subtree  is 
n  (see  Exercise  k) .      The  tree  is  reblanced  by  changing  it  to 


Subtree  A 
of  height 
n+1 


Subtree  B 

of  height 

n 


Subtree  C 
of  height 
n 


Case  3: 


Subtree  A 
of  height 
n+1 


ubtree  B 
Df  height 
n+1 


Subtree  D 
of  height 
n+1 


Subtree  C 
of  height 
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Here  the  new  node  has  been  added  in  subtree  B  and  so  the  height  of 
the  left  subtree  of  p  is  n+3  while  the  height  of  the  right  subtree  is  n+1. 
The  tree  is  rebalanced  by  changing  it  to 


Subtree  A 
of  height 
n+1 


Subtree  B 

of  height 

n+1 


Subtree  C 
of  height 
n 


Subtree  D 

of  height 

n+1 


(Why  are  these  three  cases  exhaustive?  In  Case  3,  what  if  the  new 
node  was  added  to  subtree  C?  See  Exercise  5-) 

On  the  basis  of  the  above  description,  we  now  give  Algorithm  I  to 
insert  a  new  node  into  an  AVL  tree,  and  restructure  that  tree  (if  need  be) 
to  keep  it  AVL.   The  algorithm  will  assume  nodes  of  the  form  indicated  in 
Figure  6, 


NAME 

COND 

LLINK 

RLINK 

Figure  6 . 
where  NAME,  LLINK,  and  RLINK  are  as  in  Figure  1,  and  COND  is  equal  to  one, 
two,  or  three  according  as  CI,  C2,  or  C3  holds  at  that  node. 


Algorithm  I:   Insertion  into  AVL  trees 

NEW  is  the  name  to  be  added  to  the  AVL  tree  pointed  to  by  T.   P  will 
be  used  in  the  search  through  the  tree  to  find  out  where  NEW  should  be  added; 
PP  is  always  one  step  behind  P  in  the  tree,  that  is,  PP  will  point  to  the 
father  of  the  node  pointed  to  by  P.   PATH  is  a  push  down  stack  used  to  store 
the  nodes  of  the  tree  on  the  path  from  the  newly  added  node  up  to  the  root. 
Each  element  on  the  stack  is  an  ordered  pair  (p,q)  where  p  is  a  pointer  to  the 
node  and  q  is  either  "L"  or  "R",  indicating  which  son  of  node  p  is  followed  in 
going  down  the  path. 

This  algorithm  uses  the  notation  a  <-  b  «-  c  to  represent  assigning  the 
value  of  c  to  both  a  and  b.   Also,  if  S  is  a  variable  whose  value  is  either 
"L"  or  "R"  then  S-LINK  is  either  LLINK  or  RLINK  according  to  the  value  of  S. 
For  example, 
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has  the   same  affect  as 


S  =  "L" 

S»LINK(P)  «-  P 

LLINK(P)  «-  P. 


Step  1  (initialize) 

Set  P  <-  T 

PATH  -  PP  «-  A 

Step  !  (Save  the  path) 

If  PP  /A  then  PATH^z(PP,S) 

Step  i  (Have  we  found  the  place?) 

If  P  =  A,  we  have  found  the  place  to  add  the  new  node:  go  to  Step  5, 
Step  k    (Go  down  the  tree) 

If  NAME(P)  =  NEW  then  the  name  is  already  in  the  tree  so  it  needn't 
be  added  and  we  are  done. 

Otherwise  set  PP  «-  P  and 

if  NAME(P)  <  NEW  set  P  «-  RLINK(P) 

S  *-  "R" 

if  NAME(P)  >  NEW  set  P  <-  LLINK(P) 

S  «-  "LM 

Go  back  to  Step  2. 

?tep  5  (Add  new  node  to  the  tree) 

X^Z  AVAIL 

NAME(X)  *-  NEW 

CODE(X)  «-  1 

LLINK(X)  *-  RLINK(X)  «-  A 

S«LINK(PP)  *-  X 

Step  6  (Update  condition  indicators) 

If  PATH  =Ji.>   we  are  done.   Otherwise,  (P.,S)<£zPATH  and  do  as 
specified  in  the  table  below 


CODE(P)  = 


r 


S  =  "R" 

S  =  "L" 

1 

C0DE(P)^2 
repeat 
Step  6 

C0DE(P)«-3 
repeat 
Step  6 

2 

Go  to  Step 
7  to 

rebalance 
the  tree 

C0DE(P)+-1 
and  we  are 
done 

' 

C0DE(P)+-1 
and  we 
are  done 

Go  to  Step  "[ 
to  rebalance 
the  tree 
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Step  7  (Case  1  and  its  symmetric  variants) 

- 

If  NAME(LLINK(LLINK(P)))  =  NEW  then 

COND(P)  -  COND(LLINK(P))  -  COND(LLINK(LLINK(P) ) )  -1 

RLINK(LLINK(P))^  P 

if  PATH  =  A  ,    set  T  -  LLINK(P),  otherwise 

(F,S)£=PATH 

S  •  LINK(F)  «-  LLINK(P) 
LLINK(P)  <-A  and  we  are  done. 

If  NAME(RLINK(LLINK(P)))  =  NEW  then 

COND(P)  -  COND(LLINK(P))  -  C0ND(RLINK(LLINK(P)  ) )  4-1 
LLINK(RLINK(LLINK(P)))  -  LLINK(P) 
RLINK(RLINK(LLINK(P)))  ^P 
if  PATH  =  A;set  T  *"  RLINK(LLINK(P)),  otherwise 

(F,S)<ZPATH 

S«LINK(F)  -  RLINK(LLINK(P)) 
RLINK(LLINK(P))  -A 
LLINK(P)  *-A  and  we  are  done. 

If  NAME(LLINK(RLINK(P)))  =  NEW  then 

COND(P)  -  COND(RLINK(P))  -  C0ND(LLINK(RLINK(P)  ) )  -1 

LLINK(LLINK(RLINK(P)))  ^P 

RLINK(LLINK(RLINK(P)))  -  LLINK(P) 

if  PATH  =  A,  set  T  «-  LLINK(RLINK(P)  ) ,   otherwise 

(F,S)<ZPATH 

S-LINK(F)  -  LLINK(RLINK(P)) 
LLINK(RLINK(P))  *-  A 
RLINK(P)  *-  A  and  we  are  done. 

If  NAME(RLINK(RLINK(P)))  =  NEW  then 

COND(P)  -  COND(RLINK(P))  -  COND(RLINK(RLINK(P) ) )  «- 1 

LLINK(RLINK(P))  *-  P 

if  PATH  =  A,  set  T  ^  KLINK(P),  otherwise 

(F,S)<=PATH 

S»LINK(F)  «-  RLINK(P) 
RLINK(P)  <-  A  and  we  are  done. 

Step  8  (Case  2  and  its  symmetric  variants) 
See  Exercise  6. 

Step  9  (Case  3  and  it  symmetric  variants) 
See  Exercise  7. 

It  can  be  shown  that  Algorithm  I  requires  time  proportional  to  log2n 
in  the  worst  case  (see  Exercise  9).   In  addition,  it  can  be  shown  that  the 
expected  time  is  also  proportional  to  log  n  (see  Exercise  10). 
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Exercises 


1.  (ML5)   Prove  that  Algorithm  S  works  correctly. 

2.  (MLO)  Show  that  Fibonacci  trees  are  among  the  most  unbalanced 
possible  AVL  trees.   The  Fibonacci  trees  are  defined  as  follows: 

Fl   -  *       F2  "  ' 


F 
n+2 


A 


F      F  ^ 
n      n+1 


3-   (0   In  an  AVL  tree  of  n  nodes,  what  is  the  length  of  the  longest  path 
from  a  leaf  to  the  root? 

k.      (3)  In  Cases  2  and  3,  why  must  the  heights  of  the  subtrees  A,B,C  and 
D  be  as  indicated  in  the  text? 

5-   (10)  Explain  in  detail  why  Cases  1,  2,  and  3  and  their  symmetric  variants 
are  exhaustive. 

6.  (15)  Write  the  necessary  steps  to  handle  Case  2  and  its  symmetric 
variants. 

7.  (15)  Write  the  necessary  steps  to  handle  Case  3  and  its  symmetric 
variants. 

8.  (M25)  Prove  that  Algorithm  I  works  correctly. 

9-   (M20)  Analyze  the  time  required  for  Algorithm  I  in  the  worst  case.   Show 
that  the  time  is  proportional  to  log  n  where  n  is  the  number  of  nodes  in 
the  tree;  that  is,  determine  constants  0L,a _,p  ,0  such  that  if  T(n)  is 
the  time  to  add  to  a  node  to  a  tree  of  n  nodes,  then 

o^log2n  +  32  <  T(n)  <  <^log2n  +  pg. 

-  .   (HMJO)  Analyze  the  expected  time  required  for  Algorithm  I.   Show  that  it 
too  is  proportional  to  log  n. 

11.  (20)  Design  an  algorithm  to  delete  a  leaf  from  an  AVL  tree  so  that  the 
tree  retains  the  AVL  property. 

12.  (30)  Extend  your  algorithm  in  Exercise  11  to  delete  any  node  of  an  AVL 
tree  so  that  the  tree  retains  the  AVL  property.   Analyze  the  time  required 
for  the  algorithm  (it  can  be  done  in  time  proportional  to  log  n).   Prove 
that  the  algorithm  works  correctly. 

13-  (h0)     Design  an  efficient  algorithm  to  merge  two  AVL  trees  so  that  the 
result  is  an  AVL  tree.   Analyze  the  time  required  by  your  algorithm. 
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